Skip to content

Conversation

@aktondak
Copy link
Contributor

@aktondak aktondak commented Sep 11, 2025

This PR updates the XRT submodule to latest. There are following changes in this PR

  1. Adds shim code to query archive path from installed driver path (XRT : [XRT-SMI] Archive migration  Xilinx/XRT#9213)
  2. Edits shim code to consume watch mode flag passed down from xrt-smi for event tracing and firmware logging. (XRT : [XRT-SMI]Watch mode flag propagation to driver Xilinx/XRT#9218)
  3. Updates context health report to get ctx id and PID from driver. (XRT : [XRT-SMI] Context health report update Xilinx/XRT#9221)

maxzhen
maxzhen previously approved these changes Sep 11, 2025
CMake/pkg.cmake Outdated
)

# Install VTD runner archive file
install(FILES ${CMAKE_CURRENT_SOURCE_DIR}/VTD/runner/xrt_smi_strx.a
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is only for stx, not phx? how about phx?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to do it in phases and add phoenix later. But now I think of it, this PR might fail pipeline for xrt-smi validate failure on phoenix.
Let me update this PR to add phoenix archive as well once this PR fails pipeline with expected behavior.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to do it in phases and add phoenix later. But now I think of it, this PR might fail pipeline for xrt-smi validate failure on phoenix. Let me update this PR to add phoenix archive as well once this PR fails pipeline with expected behavior.

Yeah, it may fail. Good if you can also have phx .a file

Signed-off-by: Akshay Tondak <[email protected]>
xdavidz and others added 7 commits September 16, 2025 17:15
…d#733)

- Introduce a single DPT-based infrastructure for firmware
  debug/profile/trace across current and future devices. Remove legacy
  event-trace/DRAM logging to cut redundancy.
- Improve ring handling by extending 32-bit FW pointers to 64-bit
  in-driver, making wrap and tail tracking robust and transparent
- Enable firmware logging by default at ERROR level. Simplify
  usage via a boolean debugfs node (dump_fw_log) to toggle printing
  to dmesg.
- Provide module params for advanced control (fw_log_level, fw_log_size,
  poll_fw_log).
- Establish common management-DMA helpers for buffer allocation/handling.
  More consolidation to follow.

Signed-off-by: Nishad Saraf <[email protected]>
Fix coverity use after free.

Signed-off-by: Nishad Saraf <[email protected]>
Signed-off-by: advanaik <[email protected]>
Signed-off-by: David Zhang <[email protected]>
Co-authored-by: advanaik <[email protected]>
* VTD submod removal

Signed-off-by: Akshay Tondak <[email protected]>

* subdir removal

Signed-off-by: Akshay Tondak <[email protected]>

---------

Signed-off-by: Akshay Tondak <[email protected]>
Signed-off-by: Akshay Tondak <[email protected]>
@maxzhen maxzhen merged commit cbbfbfd into amd:main Sep 17, 2025
1 check passed
@aktondak aktondak deleted the archive_support branch September 23, 2025 07:16
xdavidz added a commit that referenced this pull request Oct 9, 2025
* Unify DPT (Debug/Profile/Trace) firmware debug across generations (#733)

- Introduce a single DPT-based infrastructure for firmware
  debug/profile/trace across current and future devices. Remove legacy
  event-trace/DRAM logging to cut redundancy.
- Improve ring handling by extending 32-bit FW pointers to 64-bit
  in-driver, making wrap and tail tracking robust and transparent
- Enable firmware logging by default at ERROR level. Simplify
  usage via a boolean debugfs node (dump_fw_log) to toggle printing
  to dmesg.
- Provide module params for advanced control (fw_log_level, fw_log_size,
  poll_fw_log).
- Establish common management-DMA helpers for buffer allocation/handling.
  More consolidation to follow.

Signed-off-by: Nishad Saraf <[email protected]>

* Fix coverity use after free (#744)

Fix coverity use after free.

Signed-off-by: Nishad Saraf <[email protected]>

* shim changes for dump log pr (#738)


Signed-off-by: advanaik <[email protected]>
Signed-off-by: David Zhang <[email protected]>
Co-authored-by: advanaik <[email protected]>

* VTD submodule removal (#745)

* VTD submod removal

Signed-off-by: Akshay Tondak <[email protected]>

* subdir removal

Signed-off-by: Akshay Tondak <[email protected]>

---------

Signed-off-by: Akshay Tondak <[email protected]>

* Fix hangs while quering HW context report (#741)

Fix hangs while quering HW context report on platforms that do not
support app health.

Signed-off-by: Nishad Saraf <[email protected]>

* XRT, VTD submodule update and shim changes (#737)

* XRT, VTD submodule update and shim changes

Signed-off-by: Akshay Tondak <[email protected]>

* VTD update

Signed-off-by: Akshay Tondak <[email protected]>

* use last firmware for verbose in-memory log (#739)

Signed-off-by: David Zhang <[email protected]>

* Unify DPT (Debug/Profile/Trace) firmware debug across generations (#733)

- Introduce a single DPT-based infrastructure for firmware
  debug/profile/trace across current and future devices. Remove legacy
  event-trace/DRAM logging to cut redundancy.
- Improve ring handling by extending 32-bit FW pointers to 64-bit
  in-driver, making wrap and tail tracking robust and transparent
- Enable firmware logging by default at ERROR level. Simplify
  usage via a boolean debugfs node (dump_fw_log) to toggle printing
  to dmesg.
- Provide module params for advanced control (fw_log_level, fw_log_size,
  poll_fw_log).
- Establish common management-DMA helpers for buffer allocation/handling.
  More consolidation to follow.

Signed-off-by: Nishad Saraf <[email protected]>

* Fix coverity use after free (#744)

Fix coverity use after free.

Signed-off-by: Nishad Saraf <[email protected]>

* shim changes for dump log pr (#738)


Signed-off-by: advanaik <[email protected]>
Signed-off-by: David Zhang <[email protected]>
Co-authored-by: advanaik <[email protected]>

* VTD submodule removal (#745)

* VTD submod removal

Signed-off-by: Akshay Tondak <[email protected]>

* subdir removal

Signed-off-by: Akshay Tondak <[email protected]>

---------

Signed-off-by: Akshay Tondak <[email protected]>

* Report addition

Signed-off-by: Akshay Tondak <[email protected]>

---------

Signed-off-by: Akshay Tondak <[email protected]>
Signed-off-by: David Zhang <[email protected]>
Signed-off-by: Nishad Saraf <[email protected]>
Signed-off-by: advanaik <[email protected]>
Co-authored-by: David Zhang <[email protected]>
Co-authored-by: Nishad Saraf <[email protected]>
Co-authored-by: advanaik <[email protected]>

* fix ubuf failure when iommu_mode=1 (#742)

Signed-off-by: Lizhi Hou <[email protected]>

* Validate changes required for telluride (#736)

Signed-off-by: Manoj Takasi <[email protected]>

* Added proper cleanup function for imported bos (#743)

Signed-off-by: Manoj Takasi <[email protected]>

* General housekeeping (#749)

General housekeeping.

Signed-off-by: Nishad Saraf <[email protected]>

* App health test wait forever and expect TDR (#751)

App health test wait forever and expect TDR.

Signed-off-by: Nishad Saraf <[email protected]>

* Make FW log parser device specific (#752)

FW log buffer format may vary based on the device generation. Make the
parser logic device specific.

Signed-off-by: Nishad Saraf <[email protected]>

* add dbg bo sync (#753)

Signed-off-by: Max Zhen <[email protected]>

* Added bo export support in VE2 (#757)

Signed-off-by: Bikash Singha <[email protected]>
Co-authored-by: Bikash Singha <[email protected]>

* Update XRT package to 202520.2.20.152 (#756)

* Add <iostream> header when it is required

Due to XRT package update, <iostream> include is removed from some
XRT header files, we should not rely on XRT header to include <iostream>
whenever it is required, we include <iostream>

Signed-off-by: Wendy Liang <[email protected]>

* xrt: update xrt version to 202520.2.20.152

Update XRT package version to 202520.2.20.152

Signed-off-by: Wendy Liang <[email protected]>

---------

Signed-off-by: Wendy Liang <[email protected]>

* Add xrt_test option for vf concurrency test (#732)

Signed-off-by: Hayden Laccabue <[email protected]>

* XDNA driver cache last async error and provide ioctl to enable user to get the async error (#740)

* amdxdna: aie2_smu: remove busy wait until SMU_RESP_REG before submit commands

As the aie2_smu_exec() function is the only funciton in XDNA driver to submit
SMU commands and wait until it has finished. And the function does lock around
the SMU registers access. And the NPU SMU is only used by xdna. Removed the
need to poll the SMU_RESP_REG until it is cleared before commiting new SMU command.

Signed-off-by: Wendy Liang <[email protected]>

* xdna: driver: cache last async error and add ioctl to get last async error

Cache the last async error received from device, and implement ioctl to
returns the get last async error with the encoded error code defined in XRT
layer and the timestamp in micro seconds on when driver received the event.

Signed-off-by: Wendy Liang <[email protected]>

* shim: implemnt xocl_errors query to get last async error

Implement xocl_errors query to get the last async error from device through
XDNA driver ioctl.

Signed-off-by: Wendy Liang <[email protected]>

* test: shim: add async error verification

Add async error verification to test async event generated
from hardware, and we can use get array ioctl to get the last
async error.

as phx firmware behaves differently on when to clear the async
errors, limit the test to npu4.

Signed-off-by: Wendy Liang <[email protected]>

---------

Signed-off-by: Wendy Liang <[email protected]>

* Change print to debug level (#758)

Change print to debug level.

Signed-off-by: Nishad Saraf <[email protected]>

* driver: amdxdna: error: get last error add missing unlock (#760)

Add missing unlock when there is no cached error.

Signed-off-by: Wendy Liang <[email protected]>

* moving to latest xrt for import/export bo issues across xdna & zocl (#761)

Co-authored-by: Ch Vamshi Krishna <[email protected]>

* Create support for FLR (#746)

Signed-off-by: Hayden Laccabue <[email protected]>

* switch to latest umq version (#759)

Signed-off-by: David Zhang <[email protected]>

* Updated bo allocation with AMDXDNA_BO_SHARE type (#763)

Signed-off-by: Bikash Singha <[email protected]>
Co-authored-by: Bikash Singha <[email protected]>

* Fix return value for aie-partitions query (#766)

Fix return value for aie-partitions query.


(cherry picked from commit 86e76e6)

Signed-off-by: Nishad Saraf <[email protected]>

* create cma bo with carvedout fashion for ve2 (#748)

Signed-off-by: Bikash Singha <[email protected]>
Co-authored-by: Bikash Singha <[email protected]>

* [XRT-SMI] Archive migration (#764)

Signed-off-by: Akshay Tondak <[email protected]>

* fix uninitialized varible (#769)

Signed-off-by: Max Zhen <[email protected]>

* Removing redundant files (#768)

Signed-off-by: Akshay Tondak <[email protected]>

* Remove device specific mgmt buffer APIs (#771)

Remove device specific mgmt buffer APIs.

Signed-off-by: Nishad Saraf <[email protected]>

* test: shim: add read async error multi times in multi threads (#772)

Add test case to read async errors multiple times in multiple
threads.

To test async error read when there is no errors, we will need
to run the test after xdna module is probed and before any runs
launched on the hardware.

Signed-off-by: Wendy Liang <[email protected]>

* Updated xdna_bo.cpp to maintain bo ref count if we creae on the same process (#774)

Signed-off-by: Manoj Takasi <[email protected]>

* Fix uninitialized app health report pointer (#775)

Fix uninitialized app health report pointer.

Signed-off-by: Nishad Saraf <[email protected]>

* Free buffer on failure and minor fixes (#776)

Free buffer on failure and minor fixes.

Signed-off-by: Nishad Saraf <[email protected]>

* test: shim: add instruction code invalid address access test (#773)

Add test case to have instruction code access invalid address
we should expect timeout and then, if we start a good run, the
good run should finish properly.

Signed-off-by: Wendy Liang <[email protected]>

* Telluride opensrc (#770)

* Adding temporal_sharing code and other latest chenges into xdna repo

Signed-off-by: Saifuddin Kaijar <[email protected]>

* Fixed the review comments

Signed-off-by: Saifuddin Kaijar <[email protected]>

* Fixed the review comments

Signed-off-by: Saifuddin Kaijar <[email protected]>

* Fixing the codeing style issue

Signed-off-by: Saifuddin Kaijar <[email protected]>

* Fixed code style issue for ve2_mgmt.c file

Signed-off-by: Kaijar, Saifuddin <[email protected]>

* Fixed code style issue for ve2_mgmt.c file 1

Signed-off-by: Kaijar, Saifuddin <[email protected]>

* Fixed code style issue for ve2_hwctx.c file v1

Signed-off-by: Kaijar, Saifuddin <[email protected]>

* Fixed code style issue for ve2  file v1

Signed-off-by: Kaijar, Saifuddin <[email protected]>

* Fixed code style issue for ve2  file v1

Signed-off-by: Kaijar, Saifuddin <[email protected]>

* Fixed code style issue for ve2  file v2

Signed-off-by: Kaijar, Saifuddin <[email protected]>

* Fixed code style issue for ve2  file v3

Signed-off-by: Kaijar, Saifuddin <[email protected]>

* Fixed coding style issues

Signed-off-by: Saifuddin Kaijar <[email protected]>

* Fixed issues after amdxdna_cma memory allocation introduced

Signed-off-by: Saifuddin Kaijar <[email protected]>

* Fixed review commets

Signed-off-by: Saifuddin Kaijar <[email protected]>

* Fixed one codingsty issue

Signed-off-by: Saifuddin Kaijar <[email protected]>

* Just a dummy changes in amdxdna files to force to build the driver again

Signed-off-by: Saifuddin Kaijar <[email protected]>

* Remove volatile keyword from ve2 driver

Signed-off-by: Saifuddin Kaijar <[email protected]>

* Removed another codingsty_check

Signed-off-by: Saifuddin Kaijar <[email protected]>

---------

Signed-off-by: Saifuddin Kaijar <[email protected]>
Signed-off-by: Kaijar, Saifuddin <[email protected]>

* Properly support exporting and importing BO in the same process (#778)

Signed-off-by: Max Zhen <[email protected]>

* fixing qos property in ve2 (#779)

Co-authored-by: Ch Vamshi Krishna <[email protected]>

* disable force preemption test in virtio environment (#782)

Signed-off-by: Max Zhen <[email protected]>

* Add debug prints and fix typo (#784)

Add debug prints and fix typo.

Signed-off-by: Nishad Saraf <[email protected]>

* Update get async error ioctl to return only 0 for success case (#783)

* amdxdna: get async error returns only 0 for success

This patch has two changes:
* returns 0 only in when get async error success
* restruct aie2 get array ioctl implementation to split get async
  error information implementation and get other hardware context
  information implementation.
* remove aie2_error get async error implementation as it is just
  a very thin wrapper which is not necessary.

Signed-off-by: Wendy Liang <[email protected]>

* shim: specify only one elment to get async error information

As there is only last element in the information array for getting
the last async error. Change the number of element to 1.

---------

Signed-off-by: Wendy Liang <[email protected]>

* Add debugfs node to dump raw fw log buffer (#785)

Add debugfs node to dump raw fw log buffer.

Signed-off-by: Nishad Saraf <[email protected]>

* Fix NULL pointer dereference for invalid sequence number (#789)

Fix NULL pointer dereference for invalid sequence number.

Signed-off-by: Nishad Saraf <[email protected]>

* Fix SMU power off issue (#787)

* Fix SMU power off issue

* Fix SMU power off issue

* Fix SMU power off issue

* Revert "Fix SMU power off issue"

This reverts commit af1867a.

* Fix SMU power off issue

* update testcase timeout value and check (#777)

* timeout value change; compare and dump ofm after timeout

Signed-off-by: advanaik <[email protected]>

* dump ofm to file after timeout

Signed-off-by: advanaik <[email protected]>

* yolov3 host code

Signed-off-by: advanaik <[email protected]>

---------

Signed-off-by: advanaik <[email protected]>

* dophine pass

Signed-off-by: David Zhang <[email protected]>

* pasid fix

Signed-off-by: David Zhang <[email protected]>

---------

Signed-off-by: Nishad Saraf <[email protected]>
Signed-off-by: advanaik <[email protected]>
Signed-off-by: David Zhang <[email protected]>
Signed-off-by: Akshay Tondak <[email protected]>
Signed-off-by: Lizhi Hou <[email protected]>
Signed-off-by: Manoj Takasi <[email protected]>
Signed-off-by: Max Zhen <[email protected]>
Signed-off-by: Bikash Singha <[email protected]>
Signed-off-by: Wendy Liang <[email protected]>
Signed-off-by: Hayden Laccabue <[email protected]>
Signed-off-by: Saifuddin Kaijar <[email protected]>
Signed-off-by: Kaijar, Saifuddin <[email protected]>
Co-authored-by: Nishad Saraf <[email protected]>
Co-authored-by: advanaik <[email protected]>
Co-authored-by: Akshay Tondak <[email protected]>
Co-authored-by: Lizhi Hou <[email protected]>
Co-authored-by: Manoj Takasi <[email protected]>
Co-authored-by: Max Zhen <[email protected]>
Co-authored-by: Bikash Singha <[email protected]>
Co-authored-by: Bikash Singha <[email protected]>
Co-authored-by: Wendy Liang <[email protected]>
Co-authored-by: Hayden Laccabue <[email protected]>
Co-authored-by: Ch Vamshi Krishna <[email protected]>
Co-authored-by: Ch Vamshi Krishna <[email protected]>
Co-authored-by: Saifuddin Kaijar <[email protected]>
Co-authored-by: amd-kirkirov <[email protected]>
Co-authored-by: AdvaitNaik <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants